Quantifying dominant bacterial genera detected in metagenomic data from fish eggs and larvae using genus‐specific primers

Abstract The goal of this study was to design genus‐specific primers for rapid evaluation of the most abundant bacterial genera identified using amplicon‐based sequencing of the 16S rRNA gene in fish‐related samples and surrounding water. Efficient genus‐specific primers were designed for 11 bacterial genera including Alkalimarinus, Colwellia, Enterovibrio, Marinomonas, Massilia, Oleispira, Phaeobacter, Photobacterium, Polarbacerium, Pseudomonas, and Psychrobium. The specificity of the primers was confirmed by the phylogeny of the sequenced polymerase chain reaction (PCR) amplicons that indicated primers were genus‐specific except in the case of Colwellia and Phaeobacter. Copy number of the 16S rRNA gene obtained by quantitative PCR using genus‐specific primers and the relative abundance obtained by 16S rRNA gene sequencing using universal primers were well correlated for the five analyzed abundant bacterial genera. Low correlations between quantitative PCR and 16S rRNA gene sequencing for Pseudomonas were explained by the higher coverage of known Pseudomonas species by the designed genus‐specific primers than the universal primers used in 16S rRNA gene sequencing. The designed genus‐specific primers are proposed as rapid and cost‐effective tools to evaluate the most abundant bacterial genera in fish‐related or potentially other metagenomics samples.

given habitat (Janda & Abbott, 2007;Simon & Daniel, 2011). The structure of the 16S rRNA gene explains its versatility for metagenomics since universal primers can be designed in regions that are highly conserved across species and the intervening hypervariable regions can be used to assign operational taxonomic units (OTUs) to the genus taxon (Baker et al., 2003;Wang & Qian, 2009). Furthermore, the hypervariable regions offer the opportunity for the development of genus and even species targeted quantitative PCR (qPCR).
Aquatic organisms are exposed to a ubiquitous and abundant microbiota and the intensification of aquaculture has increased interest in characterizing the microbiota of fish. Initially traditional microbiological approaches based on in vitro culture were used, but more recently metagenomics approaches have been deployed (Martínez-Porchas & Vargas-Albores, 2017). Aquaculture differs from terrestrial farming systems as there is a much larger number of species exploited and this is coupled to a wide variety of environmental conditions (e.g., temperature and salinity) and geographical locations (Linhart et al., 2021). The variability of fish microbiota has been linked to geography, species, and environmental conditions and indicates that fish microbiomes have a degree of farm site-specificity (Najafpour et al., 2021). Metagenomics studies targeting fish indicate that Vibrio and Pseudomonas are the dominant bacterial genera (reviewed by Egerton et al., 2018). However, the exclusive dependence on relative abundance data from NGS can lead to misinterpretation of microbial community structure (Jian et al., 2020). For this reason, it has been proposed that the use of genus-specific primers can provide complementary, quantitative data, to corroborate NGS results and contribute to better understanding microbial diversity and population structure (Zhou et al., 2014). A general literature review of microbiome studies in fish reveals that genus-specific primers for the most representative bacterial genera with high relative abundance are unavailable. Although genusspecific primers exist for the detection of pathogen-containing genera such as Aeromonas, Vibrio, Edwardsiella, and Streptococcus, their use has not been correlated with 16S rRNA metagenomic profiles in fish (D. Zhang et al., 2014).
We previously generated 16S rRNA metagenomics datasets for Gilthead seabream (Sparus aurata) and European seabass (Dicentrarchus labrax) eggs from several commercial production sites in Europe and identified the profile of the main bacterial genera (Najafpour et al., 2021). The objective of the present study was to develop a quick, cost-effective, and practical approach for large-scale screening of the core microbiome during aquaculture production cycles. We report the design of genus-specific primers, exploiting the hypervariable characteristics of the 16S rRNA gene, for the dominant bacterial genera represented in our in-house metagenomic 16S rRNA gene datasets from eggs, larvae, live feed, and tank water samples from seabream and seabass aquaculture sites in Europe (Najafpour et al., 2021). The efficiency and specificity of the genus-specific primers were confirmed by qPCR and sequencing of the PCR amplicons, and coverage of each genus was validated by comparison of qPCR and metagenomics data (Najafpour et al., 2021).

| Selection of target genera
The summarized workflow for genus choice and primer design is presented in Figure 1. The most represented bacterial genera in seabream and seabass hatcheries in Europe were identified using inhouse metagenomic datasets of 16S rRNA gene sequences obtained from eggs, larvae, live feed, and hatchery water (sequenced by Lifesequencing S.L.-ADM, Spain and Stab Vida, Lda, Portugal). The corresponding 16S rRNA gene sequences of target genera were obtained from the LPSN and Silva (SSU r138.1) databases (Parte et al., 2020;Quast et al., 2013).

| Sequence alignment and primer design
Multiple sequence alignments of the retrieved 16S rRNA gene sequences from the LPSN and Silva databases were performed using the MUSCLE algorithm (Edgar, 2004) in the Aliview platform v 1.27 (Larsson, 2014). Genus-specific forward (Fw) and reverse (Rv) primers were designed manually to obtain primers that amplified the maximum number of species in each of the target bacterial genera.
The size of the amplicon was set at between 85 and 250 bp. Criteria used for PCR primer selection included the melting temperature (Tm), percentageof GC content, GC clamp, secondary structure, and the tendency to form primer-dimers. Primers were analyzed and optimized using the oligonucleotide sequence calculator, F I G U R E 1 The workflow followed for the design of the bacterial 16S rRNA genus-specific primers for the most abundant genera in the microbiome of fish larvae, fish eggs, zooplankton, phytoplankton, and water. The most abundant genus detected in each data set was ranked using an in-house 16S rRNA gene database. The egg microbiome profile obtained from 16S rRNA gene sequencing has previously been reported (Najafpour et al., 2021).
In general, the threshold for primer selection included ΔG > −9 to minimize the likelihood of self-dimers, hairpins, and heterodimers and 50%-55% GC content for both the Fw and Rv primers to favor specific annealing to the targeted templates (Table A1). Primers that did not comply with the selection criteria were rejected, apart from the primers for Enterovibrio for which it was not possible to meet all the criteria (Table A1).
To further confirm that the designed genus-specific primer pairs would anneal to the maximal number of species in a given genus, in silico PCR simulations using TestPrime 1.0, available in the SILVA platform, was used (Klindworth et al., 2013). The primer pairs with the maximum specificity for each taxonomic group and with the best match to the selection criteria were selected and synthesized (Table A2, Specanalitica, Carcavelos, Portugal). On arrival, primers were resuspended in sterile, nuclease-free water to prepare 100 µM stocks (Table A2).

| Evaluation of primer specificity by PCR amplification
The performance of the genus-specific primers was initially evaluated by running conventional PCR (Bio-Rad, T100 Thermal Cycler).
Genomic DNA was extracted from seabream and seabass eggs, whole larvae, and rotifers (feed) using a DNeasy Blood & Tissue Kit (Qiagen) as previously described (Najafpour et al., 2021). The PCR was carried out using 2 µl (approximately 40-80 ng) of genomic DNA, 2.5 µl of 10× Dream Taq Green Buffer containing 20 mM MgCl 2 (Thermo Scientific), 0.5 µl of nucleoside triphosphate 10 µM stock, 1.25 µl of the Fw primer (10 µM), 1.25 µl of the Rv primer (10 µM), 0.2 µl of Dream Taq DNA polymerase (5 U/µl, Thermo Scientific), and 17.3 µl of sterile, nuclease-free water to give a final reaction volume of 25 µl. The PCR thermocycle consisted of 1 cycle of 95°C for 3 min followed by 34 cycles of 95°C for 10 s, a gradient of melting temperatures tested for each primer pair (57-64°C) for 10 s and 72°C for 10 s and a final cycle at 72°C for 5 min. The genus-specific primers were tested in PCR amplification of genomic DNA extracted from larval intestines, whole larvae, rotifers, and egg samples of seabream and seabass that had a high relative abundance of a given bacterial genus in 16S rRNA metagenomics.
The 16S rRNA gene PCR reaction products were run on a 2% agarose gel in tris-acetate-EDTA buffer and the specific amplicons generated by each primer pair were purified using an Illustra GFX PCR DNA and Gel Band Purification Kit (GE Healthcare) following the manufacturer's instructions. The purified PCR products were ligated into the pGEM-T Easy cloning vector (Promega) which permits colony selection based on white/blue color and resistance to the antibiotic ampicillin. The ligation reactions were performed overnight at 4°C in a 10 µl reaction mix containing 5 µl 2× Rapid Ligation buffer, 50 ng/µl pGEM ® -T Easy Vector, 1.5 U of T4 DNA Ligase (Promega), the optimal concentration of each of the purified PCR amplicons (between 19.8 and 80.5 ng) and sterile water, to give a final reaction volume of 10 µl. DH5α competent cells (Escherichia coli) were transformed with 5 µl of the ligation reaction, plated on lysogeny broth agar containing 75 µg/ml ampicillin, 0.5 mM isopropyli β-d-1thiogalactopyranoside, and 80 μg/ml X-Gal (5-bromo-4-chloro-3indolyl β-D-Galactopyranoside), and incubated at 37°C overnight.
Minipreps of plasmid DNA were prepared from white colonies and at least two colonies per genus were sequenced using the Sanger method.
The primer specificity and performance were evaluated by assigning taxonomy to each amplicon sequence using the SILVA 16S rRNA gene database as a reference and phylogenetic tree generation, based on the SILVA workflow (de novo including neighbors). For the phylogenetic tree, the sequences of the selected colonies were aligned using the Randomized Axelerated Maximum Likelihood (RAxML) tool with a generalised time reversible (GTR) model and Gamma rate model for likelihoods (Stamatakis, 2014).

| Real-time qPCR optimization and primer efficiency
After confirming primer specificity by amplicon sequencing, qPCR reactions were optimized using a Bio-Rad CFX96 qPCR Instrument (Bio-Rad Laboratories). The primer performance was initially evaluated using a gradient of melting temperatures (T m ) in a reaction volume of 10 µl containing 200 nM of each primer, 2 µl of DNA (80 ng DNA/2 µl) for the samples, or 2 µl of serial dilutions (corresponding to 10 2 -10 7 template copies in the reaction) for the standard curve, 5 µl of 2× Forget-Me-Not™ EvaGreen ® qPCR Master Mix (Biotium) and 2.4 µl of sterile nuclease-free water. Thermocycling conditions were 95°C for 2 min, followed by 40 cycles of 95°C for 5 s, the optimized melting temperature for each primer pair for 10 s (between 58 and 61°C) and 72°C for 10 s. A final melting curve was generated by increasing the temperature up to 95°C in increments of 0.5°C every 10 s to confirm single reaction products were obtained. Control reactions included substitution of genomic DNA by water to confirm the absence of contamination.

| Bacterial genus quantification and correlation with 16S rRNA microbiome profiling
To compare qPCR and 16S rRNA gene abundance estimates, Spearman correlations were used together with scatter plots generated using the R package ggplot2 v 3.3.5 (Wickham, 2016).
Massilia, Phaeobacter, and Pseudomonas were quantified in seabream (n = 9; three larvae samples in the age range of 5-15 days posthatch [dph] and six samples in the age range of 43-58 dph) and seabass (n = 15; seven larvae samples in the age range of 5-7 dph and eight samples in the age range of 42-46 dph) larvae (Table A3). In the case NAJAFPOUR ET AL. | 3 of 14 of Pseudomonas where the correlation between 16S rRNA gene sequencing and copy number determined by qPCR was low, further in silico analysis was done to assess genus-specific primer performance in comparison to the universal primers used for 16S rRNA metagenomics studies. In silico PCR v 0.5.1 implemented in Ubuntu (20.04.2 LTS) and unique Pseudomonas 16S rRNA gene sequences from the Silva database with a minimum length of 900 bp were used.

| Primer specificity
In total, 11 bacterial genera were targeted based on their high relative abundance in fish-related samples as determined by 16S rRNA gene sequencing ( Table 1). The species for which the 16S rRNA genes were readily amplified by each primer pair are indicated in Table A4

| Primer efficiency and correlation between qPCR and 16S RNA gene abundance
The efficiency of bacterial genus-specific primers was evaluated using qPCR with the optimized annealing temperature of each primer pair ( Table 2). All the genus-specific primers had an acceptable efficiency within the range of 92%-105.5%.
The five most abundant bacterial genera in seabream and seabass eggs were quantified. In general, the relative abundance profiles of the 16S rRNA gene sequencing and qPCR amplification of different bacterial genera were matched with some exceptions    Several studies have focused on Pseudomonas spp. due to their widespread distribution and the presence of species that are human, animal, and plant pathogens (Palleroni, 2015). For example, P. bactica is a pathogen of marine fish, and primers targeting the gyrB gene (c390-F1 and c390-R1) and rpoD gene have been developed for rapid diagnosis (López et al., 2016). A multiplex PCR based on oprI and oprL genes was developed for the detection of Pseudomonas strains from a bacterial collection isolated from water (Matthijs et al., 2013). The qPCR comparisons, in the present study, of the abundance of five bacterial genera (Colwellia, Oleispira, Pseudomonas, Psychrobium, Phaeobacter) in the egg samples revealed a higher than expected copy number of the Pseudomonas genus relative to the results of the   In seabream and seabass eggs, Pseudophaeobacter was relatively more abundant than Phaeobacter, which was not detected by 16S rRNA gene sequencing in eight out of nine egg samples (Najafpour et al., 2021). In contrast, Phaeobacter was more common in 16S rRNA gene sequences of seabream and seabass larvae, and the Pseudophaeobacter genus was not detected (in-house database). Phaeobacter spp. are common in marine organisms and the environment (Martens et al., 2006;Yoon et al., 2007;D. C. Zhang et al., 2008)  F I G U R E 3 Analysis of the efficiency of five genus-specific primer sets by comparing the relative abundance of the genus detected in seabream and seabass eggs using 16S rRNA gene sequencing and the copy number determined by qPCR. (a) The relative abundance (%) of five bacterial genera determined by 16S rRNA gene sequencing in nine egg samples from seabream (SA, n = 6) and seabass (DL, n = 3) from three different aquaculture sites (S1, S2, and S3). (b) Quantitative analysis of five bacterial genera in nine egg samples from seabream (n = 6) and seabass (n = 3) using qPCR with the designed genus-specific primers. Full details of the experimental design, DNA extraction method, and the microbiome profile obtained using 16S rRNA gene sequencing of eggs samples are available in the Najafpour et al. (2021). The graphical plots were generated in the R package ggplot2 v 3.3.5. qPCR, quantitative polymerase chain.
Nonetheless, the identified genera were well represented in seabream and seabass larvae, rotifers, and tank water, and therefore genus-specific primers were validated in qPCR. The designed genusspecific primers revealed a high relative abundance of the Massilia

F I G U R E 4
The correlation between genus-specific copy number established by qPCR quantification and relative abundance of six bacterial genera determined by 16S rRNA gene sequencing of seabream and seabass egg samples. The scatter plots were generated using the "Spearman" method in an R environment. Two sets of samples and databases were used for the correlation analysis.
(1) For the genera, Colwellia, Oleispira, and Psychrobium, the correlation analysis was established by comparing the relative abundance of the genera in nine egg samples of seabass (n = 3) and seabream (n = 6) obtained from 16S rRNA gene sequencing (Najafpour et al., 2021) and the copy number determined using genus-specific primers and qPCR.
(2) For the genera, Massilia, Phaeobacter, and Pseudomonas, the correlation analysis was established by comparing the relative abundance of the genera in 24 larval samples obtained from 16S rRNA gene sequencing (in-house database) and copy number determined using genus-specific primers and qPCR. The list of samples used in the analysis is available in Table A3. qPCR, quantitative polymerase chain.
are widespread in soil (Y. Q. Zhang et al., 2006), drinking water (Gallego et al., 2006), and plants (Ofek et al., 2012) and was a dominant genus in the gastrointestinal microbiota of the herbivorous grass carp, Ctenopharyngodon idellus . Marinomonas species are abundant in seawater (Ivanova et al., 2005;Yoon et al., 2005) and were highly abundant in the microbial community of rotifer cultures prepared as a feed for fish larvae (Rombaut et al., 2001). Polaribacter species have been isolated from Antarctic soil and in biofilms on stones from the North Sea (Choo et al., 2020;Kim et al., 2013), and the genus was detected in the intestine of marine organisms and algae (Hyun et al., 2014;Nedashkovskaya et al., 2013;Wei et al., 2018). Alkalimarinus sediminis was isolated from marine sediment in Shandong Province, China (Zhao et al., 2015), and Alkalimarinus species were found in bone-eating worms (Osedax mediterranea) from the Mediterranean Sea (Hewitt et al., 2020).
Photobacterium has been isolated from seawater, mussel, eggs of spiny lobster and fish intestine and the bioluminescence and pathogenicity of some species (e.g., Photobacterium damselae) has made studies of them a priority (Egerton et al., 2018;Labella et al., 2017;Osorio et al., 1999).

ACKNOWLEDGMENTS
The authors acknowledge all the aquaculture hatcheries for supplying the experiment samples. The authors also thank João C. R. Cardoso of the PT ALG-01-0145-FEDER-022121.

CONFLICTS OF INTEREST
None declared.

ETHICS STATEMENT
None required. Abbreviations: dph, days posthatch; qPCR, quantitative polymerase chain reaction. a Nine egg samples (E1-E9) used for qPCR reactions of five bacterial genera (Pseudomonas, Oleispira, Colwellia, Psychrobium, and Phaeobacter), and the details of their 16S rRNA gene sequencing profiles are available in Najafpour et al. (2021). "Factors driving bacterial microbiota of eggs from commercial hatcheries of European seabass and gilthead seabream" and the metagenomics raw data were deposited at NCBI SRA (sequence read archive) under project number PRJNA727018. Twenty-four larvae samples (L1-L24) and their corresponding qPCR and 16S rRNA gene sequencing data (available as inhouse datasets) were also used to correlate qPCR and 16S rRNA gene abundance estimates of three genera (Massilia, Phaeobacter, and Pseudomonas). b For larvae samples, the average number of the sequenced reads is presented for each age range.